Goto

Collaborating Authors

 memory-efficient backpropagation


A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Neural Information Processing Systems

Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by $36\%\sim81\%$, which outperforms the reduction achieved by other methods.


Memory-Efficient Backpropagation Through Time

Neural Information Processing Systems

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.


Reviews: A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Neural Information Processing Systems

The paper proposes a method that reduces memory consumption of back prop, and as a result allows the use of larger batch sizes. The authors state that this is significant e.g. with batch norm, where batch size matters. In my own experience, I believe this is also significant for improved GPU utilization and data-parallel training. The paper is original in that I have not seen a similar treatment (although the actual solution may have been relatively straight-forward; asking the right question was an important part of this). The paper is well written and easy to follow.


Reviews: A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Neural Information Processing Systems

This work formalizes the problem of minimizing memory consumption through recomputation when performing a forward-backprop evaluation of a computation graph; it provides an optimal dynamic programming algorithm and an efficient heuristic and demonstrates strong improvements in memory savings over existing methods. Three expert reviewers initially assess the paper as 8/8/5, and the authors provided a detailed rebuttal. All reviewers took part in a discussion and the final assessment is 8/8/6, with reviewer consensus that this method is practically useful and the reported gains are strong. Overall this work is a nice contribution.


Reviews: Memory-Efficient Backpropagation Through Time

Neural Information Processing Systems

The authors are solving an important problem. RNN training procedures can be greedy for memory. And, given the sequential nature, it's not trivial to simply to scale the training of each sequence over many machines. As a result, it's important to judiciously use memory and computational resources to train RNNs efficiently. I'm pleased to see the authors not only proposing a new instance of a solution, but to provide a user-selectable tradeoff between the quantity of computation and the memory usage.


A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Neural Information Processing Systems

Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by 36\%\sim81\%, which outperforms the reduction achieved by other methods.


Memory-Efficient Backpropagation through Large Linear Layers

Bershatsky, Daniel, Mikhalev, Aleksandr, Katrutsa, Alexandr, Gusak, Julia, Merkulov, Daniil, Oseledets, Ivan

arXiv.org Machine Learning

In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less memory with a moderate decrease of the test accuracy. Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication. We compare this variance with the variance coming from gradient estimation based on the batch of samples. We demonstrate the benefits of the proposed method on the fine-tuning of the pre-trained RoBERTa model on GLUE tasks.


Memory-Efficient Backpropagation Through Time

Gruslys, Audrunas, Munos, Remi, Danihelka, Ivo, Lanctot, Marc, Graves, Alex

Neural Information Processing Systems

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes.


A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Kusumoto, Mitsuru, Inoue, Takuya, Watanabe, Gentaro, Akiba, Takuya, Koyama, Masanori

Neural Information Processing Systems

Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by $36\%\sim81\%$, which outperforms the reduction achieved by other methods. Papers published at the Neural Information Processing Systems Conference.